Skip to main content

Overview

Phase 2 enriches the core dataset by fetching regulatory filings, news, corporate actions, surveillance lists, and market-specific data. All scripts in this phase depend on master_isin_map.json from Phase 1.
Phase 2 includes Phase 2.5 (OHLCV Data) which can be toggled via FETCH_OHLCV = True/False in run_full_pipeline.py.

Execution Order

Phase 2 runs 13 scripts in parallel where possible:
1

Company Filings

fetch_company_filings.py — Hybrid LODR + Legacy filings (threaded)
2

Announcements

fetch_new_announcements.py — Live corporate announcements (40 threads)
3

Advanced Indicators

fetch_advanced_indicators.py — Pivot points, EMA/SMA signals (50 threads)
4

Market News

fetch_market_news.py — AI-sentiment news (15 threads)
5

Corporate Actions

fetch_corporate_actions.py — Dividends, bonus, splits (history + upcoming)
6

Surveillance Lists

fetch_surveillance_lists.py — ASM/GSM lists
7

Circuit Stocks

fetch_circuit_stocks.py — Upper/lower circuit stocks
8

Bulk/Block Deals

fetch_bulk_block_deals.py — Last 30 days deals
9

Incremental Price Bands

fetch_incremental_price_bands.py — Daily band revisions
10

Complete Price Bands

fetch_complete_price_bands.py — All securities bands
11

All Indices

fetch_all_indices.py — 194 market indices
12

OHLCV Data (Phase 2.5)

fetch_all_ohlcv.py — Historical stock OHLCV (smart incremental)
13

Indices OHLCV (Phase 2.5)

fetch_indices_ohlcv.py — Historical index OHLCV

Script 1: fetch_company_filings.py

Purpose

Fetches regulatory filings from two endpoints and merges results for maximum coverage.

API Endpoints

POST https://ow-static-scanx.dhan.co/staticscanx/lodr

Request Payload

{
  "data": {
    "isin": "INE002A01018",
    "pg_no": 1,
    "count": 100
  }
}

Output Files

File PatternDescriptionCount
company_filings/{SYMBOL}_filings.jsonMerged filings per stock2,775 files

Deduplication Logic

Filings are deduplicated by news_id + news_date + caption to avoid duplicates from both endpoints.

Threading

  • Workers: 20 concurrent threads
  • Typical Time: ~3-5 minutes

Script 2: fetch_new_announcements.py

Purpose

Fetches live corporate announcements (non-regulatory) from Dhan.

API Endpoint

POST https://ow-static-scanx.dhan.co/staticscanx/announcements

Request Payload

{
  "data": {
    "isin": "INE002A01018"
  }
}

Output Files

FileDescriptionSize
all_company_announcements.jsonAll announcements across stocks~8 MB

Threading

  • Workers: 40 concurrent threads
  • Typical Time: ~2-3 minutes

Script 3: fetch_advanced_indicators.py

Purpose

Fetches technical indicators: Pivot Points, EMA/SMA signals, MACD, RSI sentiment.

API Endpoint

POST https://ow-static-scanx.dhan.co/staticscanx/indicator

Request Payload

{
  "exchange": "NSE",
  "segment": "E",
  "security_id": "2885",
  "isin": "INE002A01018",
  "symbol": "RELIANCE",
  "minute": "D"
}

Output Files

FileDescriptionSize
advanced_indicator_data.jsonAll technical indicators~8.3 MB

Data Fetched

  • Pivot Point (daily)
  • SMA Status (20, 50, 200)
  • EMA Status (20, 200)
  • RSI Sentiment
  • MACD Sentiment

Threading

  • Workers: 50 concurrent threads
  • Typical Time: ~2 minutes

Script 4: fetch_market_news.py

Purpose

Fetches AI-sentiment tagged news (up to 50 articles per stock).

API Endpoint

POST https://news-live.dhan.co/v2/news/getLiveNews

Request Payload

{
  "categories": ["ALL"],
  "page_no": 0,
  "limit": 50,
  "first_news_timeStamp": 0,
  "last_news_timeStamp": 0,
  "news_feed_type": "live",
  "stock_list": ["INE002A01018"],
  "entity_id": ""
}

Output Files

File PatternDescriptionCount
market_news/{SYMBOL}_news.jsonNews per stock2,775 files

Pagination

  • Default limit: 50 per stock
  • Max tested: 100 (via page_no iteration)

Threading

  • Workers: 15 concurrent threads
  • Typical Time: ~4-6 minutes

Script 5: fetch_corporate_actions.py

Purpose

Fetches corporate actions (dividends, bonus, splits) in two modes:
  • History: Last 2 years
  • Upcoming: Next 2 months

API Endpoint

POST https://ow-scanx-analytics.dhan.co/customscan/fetchdt

Request Payload

{
  "data": {
    "type": "full",
    "whichpage": "corporate_action",
    "filters": [
      {
        "field": "CorpAct.ActDate",
        "op": "GT",
        "val": "2026-03-03"
      }
    ],
    "count": 5000,
    "page": 1
  }
}

Output Files

FileDescriptionTimeframe
upcoming_corporate_actions.jsonFuture actionsNext 2 months
history_corporate_actions.jsonPast actionsLast 2 years

Typical Time

~10-15 seconds — Two API calls with count: 5000

Script 6: fetch_surveillance_lists.py

Purpose

Fetches NSE ASM/GSM surveillance lists (stocks under additional surveillance measures).

Data Sources

  1. Primary: Google Sheets Gviz endpoint (NSE-hosted)
  2. Fallback: Dhan Next.js API

Output Files

FileDescription
nse_asm_list.jsonAdditional Surveillance Measure (ASM) stocks
nse_gsm_list.jsonGraded Surveillance Measure (GSM) stocks

Typical Time

~5-10 seconds — Direct CSV/API fetch

Script 7: fetch_circuit_stocks.py

Purpose

Fetches stocks that hit upper/lower circuit limits today.

API Endpoint

POST https://ow-scanx-analytics.dhan.co/customscan/fetchdt

Request Payload

Upper Circuit Filter
{
  "data": {
    "type": "full",
    "whichpage": "nse_total_market",
    "filters": [
      {"field": "UpperCircuit", "op": "EQ", "val": "1"}
    ],
    "count": 500,
    "page": 1
  }
}

Output Files

FileDescription
upper_circuit_stocks.jsonStocks at upper circuit
lower_circuit_stocks.jsonStocks at lower circuit

Typical Time

~5 seconds — Two API calls

Script 8: fetch_bulk_block_deals.py

Purpose

Fetches bulk/block deals from the last 30 days.

API Endpoint

POST https://ow-static-scanx.dhan.co/staticscanx/deal

Request Payload

{
  "data": {
    "defaultpage": "N",
    "pageno": 1,
    "pagecount": 50
  }
}

Output Files

FileDescription
bulk_block_deals.jsonAll deals (auto-paginated)

Pagination

Auto-paginates through all pages (50 records/page).

Typical Time

~10-15 seconds — Depends on total deals

Script 9 & 10: Price Bands

fetch_incremental_price_bands.py

Fetches daily price band changes from NSE.
GET https://nsearchives.nseindia.com/content/equities/eq_band_changes_{date}.csv
Output: incremental_price_bands.json

fetch_complete_price_bands.py

Fetches complete price band list for all securities.
GET https://nsearchives.nseindia.com/content/equities/sec_list_{date}.csv
Output: complete_price_bands.json

Typical Time

~5 seconds each — Direct CSV downloads

Script 11: fetch_all_indices.py

Purpose

Fetches all 194 market indices (NIFTY 50, BANKNIFTY, sectoral indices).

API Endpoint

POST https://ow-scanx-analytics.dhan.co/customscan/fetchdt

Output Files

FileDescriptionRecords
all_indices_list.jsonAll NSE indices194

Typical Time

~5 seconds — Single API call

Phase 2.5: OHLCV Data (Optional)

Phase 2.5 is controlled by FETCH_OHLCV = True/False in run_full_pipeline.py. If disabled, ADR, RVOL, ATH fields will be 0.

Script 12: fetch_all_ohlcv.py

Purpose

Fetches lifetime daily OHLCV candles for all stocks with smart incremental updates.

API Endpoint

POST https://openweb-ticks.dhan.co/getDataH

Request Payload

{
  "EXCH": "NSE",
  "SYM": "RELIANCE",
  "SEG": "E",
  "INST": "EQUITY",
  "SEC_ID": "2885",
  "EXPCODE": 0,
  "INTERVAL": "D",
  "START": 215634600,
  "END": 1709452800
}

Output Files

File PatternDescriptionCount
ohlcv_data/{SYMBOL}.csvDaily OHLCV candles2,775 files

Smart Incremental Update

  • First run: Fetches full history (~30 minutes)
  • Subsequent runs: Only fetches missing days (~2-5 minutes)

CSV Structure

Timestamp,Open,High,Low,Close,Volume
1647878400,2450.50,2475.00,2445.00,2468.75,12500000
1647964800,2470.00,2485.50,2460.00,2478.30,11800000

Threading

  • Workers: 15 concurrent threads
  • Typical Time:
    • First run: ~30 minutes
    • Incremental: ~2-5 minutes

Script 13: fetch_indices_ohlcv.py

Purpose

Fetches historical OHLCV for all 194 indices (high-speed specialized endpoint).

Output Files

File PatternDescription
ohlcv_data/indices/{INDEX}.csvIndex OHLCV candles

Typical Time

~1-2 minutes — Optimized for indices

Phase 2 Output Summary

Files Produced

📦 Phase 2 Outputs:
├─ company_filings/
│  ├─ RELIANCE_filings.json
│  ├─ TCS_filings.json
│  └─ ... (2,775 files)
├─ market_news/
│  ├─ RELIANCE_news.json
│  ├─ TCS_news.json
│  └─ ... (2,775 files)
├─ all_company_announcements.json      (~8 MB)
├─ advanced_indicator_data.json        (~8.3 MB)
├─ upcoming_corporate_actions.json     (~500 KB)
├─ history_corporate_actions.json      (~2 MB)
├─ nse_asm_list.json
├─ nse_gsm_list.json
├─ upper_circuit_stocks.json
├─ lower_circuit_stocks.json
├─ bulk_block_deals.json
├─ incremental_price_bands.json
├─ complete_price_bands.json
├─ all_indices_list.json
└─ ohlcv_data/                          (if FETCH_OHLCV=True)
   ├─ RELIANCE.csv
   ├─ TCS.csv
   └─ ... (2,775 files)

Performance Metrics

Total Phase 2 Time (without OHLCV)

~8-12 minutes for all enrichment scripts

Total Phase 2 Time (with OHLCV)

  • First run: ~35-40 minutes
  • Incremental: ~10-15 minutes

Bottlenecks

  1. OHLCV fetch: Largest time consumer (~30 min first run)
  2. News fetch: 15 threads × 2,775 stocks (~5 min)
  3. Filings fetch: Dual endpoint merge (~4 min)

Dependencies on Phase 1

All Phase 2 scripts require master_isin_map.json from Phase 1:
with open("master_isin_map.json", "r") as f:
    master_map = json.load(f)

for stock in master_map:
    # Use stock["ISIN"], stock["Sid"], stock["Symbol"]

Error Handling

Phase 2 scripts use soft failure mode:
results[script] = run_script(script, "Phase 2")
# Pipeline continues even if enrichment scripts fail

Impact of Failures

  • Filings fail: Event markers will miss filing-based triggers
  • News fail: News Feed field will be empty
  • OHLCV fail: ADR, RVOL, ATH metrics will be 0
  • Corporate actions fail: Event markers will miss dividend/bonus/split icons

Next Phase

Once Phase 2 completes, the pipeline proceeds to:

Phase 3: Base Analysis

Builds the master all_stocks_fundamental_analysis.json by merging all Phase 1 and Phase 2 data.